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Abstract 

We address a primary question of computational as well as biological research on evolution: How can 
an exploration strategy adapt in such a way as to exploit the information gained about the problem at 
hand? We first introduce an integrated formalism of evolutionary search which provides a unified view 
on different specific approaches. On this basis we discuss the implications of indirect modeling (via a 
"genotype- phenotype mapping") on the exploration strategy. Notions such as modularity, pleiotropy 
and functional phenotypic complex are discussed as implications. 

Then, rigorously reflecting the notion of self-adaptability, we introduce a new definition that captures 
self-adaptability of exploration: different genotypes that map to the same phenotype may represent 
(also topologically) different exploration strategies; self-adaptability requires a variation of exploration 
strategies along such a "neutral space" . By this definition, the concept of neutrality becomes a central 
concern of this paper. 

Finally, we present examples of these concepts: For a specific grammar-type encoding, we observe a 
large variability of exploration strategies for a fixed phenotype, and a self-adaptive drift towards short 
representations with highly structured exploration strategy that matches the "problem's structure" . 

Keywords 

Exploration, self-adaptability, evolvability, neutrality, modularity, pleiotropy, functional phenotypic 
complex. 

1 Introduction 

Typically, when a problem is given, the space of all potential solutions is too large to try all 
of them in reasonable time. If not making any further assumptions on the problem, there 
neither exists a preferable strategy to search for solutions. Usually though, one assumes 
that the problem is not notoriously arbitrary, that it has some "structure" and that there 
might exist some smart strategies to explore the space. More specifically, one hopes that 
one can draw information from the quality of previously explored solutions on how to choose 
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new explorations. For example, when assuming some "continuity" of the problem, one may 
search further in regions of previously explored good solutions. 

A more elaborated strategy is the following: Analyze the statistics of previously found 
solutions, find correlations between certain characters (parameters) of the solution and the 
solution's quality, find mutual information between the characters of good solutions, etc., and 
exploit all this information to choose further explorations — in the hope that these findings 
really characterize the problem, that the problem is characterizable by such information. In 
essence, the latter approach will explore only a tiny part of P, strongly dependent on early 
explorations that have been successful. Found solutions may lay no claim to be globally 
optimal; they are a further development of early successful concepts. 

The central questions become: How can we analyze the statistics of previously explored 
and evaluated solutions? How can we represent this gained information? How can we model 
an exploration strategy depending on this information? 

One direct approach to these questions leads to statistical models of exploration. For ex- 
ample, a Bayesian network can encode the probability of future explorations (the exploration 
density) and is trained with previously successful solution parameters (as done by Pelikan, 
Goldberg, and Cantu-Paz (2000), see appendix A). In contrast, we will argue that the explo- 
ration strategy can be modeled by a mapping onto the solution space, a genotype-phenotype 
mapping. This means that a (simple) density on a base space (genotype) is lifted to the 
exploration density on the search space (phenotype). The implications of such an ansatz 
are far-reaching: An exploration density now exists on both, the base space and the search 
space. In both spaces notions as neighborhood or topology should be constituted only by the 
exploration density. In this respect, the genotype-phenotype mapping is a lift of (topological) 
structure from the base space to the search space. 

To investigate the implications, we assume that the exploration density on the base space 
is one of independent random variables. Then, for a given mapping, we investigate the ex- 
ploration density on the search space; in particular the correlations and mutual information 
between phenotypic variables. This structuredness of phenotypic exploration coherently im- 
plies notions as "modularity" and "functional phenotypic complex" . Concerning the adaption 
of this structure, we will argue for a self-adaptive mechanism, in place of a statistical analysis 
of characters of good solutions (as with the Bayesian ansatz). A major goal of this paper is 
formal and notational clarity of such issues. 

The paper is organized as follows: The next section starts by introducing a general notation 
of evolutionary search. This notation emphasizes the role of the exploration density in the 
search space and, even more, the way of parameterization of this density. We call the latter 
"exploration model" . An important point of this section is that most evolutionary algorithms 
differ just by this exploration model. Since it distracts from the major line of this paper we 
moved this reinvestigation of existing evolutionary algorithms to appendix A. 

In section 3 we introduce and formalize the idea of indirect modeling. Instead of param- 
1 which requires to identify a topology on the search space 
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eterizing the exploration density directly on the search space, we introduce the additional 
base space, parameterize a simple density thereon, and lift this density on the search space. 
We compare this to the lift of a topological or metrical structure onto a manifold from a 
simple-structured base space. Notions as pleiotropy and functional phenotypic complex are 
discussed as implications of such a lift. We also relate the biological view on indirect modeling 
(here, via a genotype-phenotype mapping) and adaptive exploration to our formalism. 

Section 4 begins by reflecting and criticizing the usual definition of self-adaptability. We 
introduce a new definition which is based on the notion of neutrality: Different genotypes 
that map to the same phenotype may represent (also topologically) different exploration 
densities. Thus, such genotypes may represent very different information and neutrality is 
not necessarily a form of redundancy as is often claimed. By this definition, neutrality becomes 
a central concern and we briefly review other research on this subject in order to argue for 
the plausibility of our interpretation. 

Finally, in section 5 we exemplify all these concepts with a running system. Simulations 
show that the exploration density adapts to the problem structure by (self-adaptive) walks 
on neutral sets. In particular, the pair-wise mutual information between phenotypic variables 
resembles the modularity of the fitness function. We also observe and explain a drift towards 
short representations. The experiments are based on a grammar-type recursive encoding 
which is thoroughly motivated by the previously developed concepts. 

2 The central role of the exploration model 

The goal of this section is to show that the central concern of evolutionary search, esp. 
evolutionary algorithms, is the modeling of exploration. We will show that the main difference 
between specific evolutionary algorithms is their ansatz to model exploration. 

Perhaps the most general idea of stochastic search, global random search, is described by 
Zhigljavsky (1991). The formal scheme of global random search reads: 

(i) Choose a probability distribution on the search space P. 

(ii) Obtain points sf\...,s^ by sampling A times from this distribution. Evaluate the 
quality of these points. 

(iii) According to a fixed (algorithm dependent) rule construct a new probability distribution 
on P. 

(iv) Check some appropriate stopping condition; if the algorithm is not terminated, then 
substitute t «— t + 1 and return to step (ii). 

This concept is general enough to include also evolutionary algorithms. However, the for- 
mulation lacks to stress that the exploration density needs to be parameterized (and instead 
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stresses the choice of update rule in step (hi)). We will stress the parameterization of explo- 
ration densities and call it the exploration model. It is this model that we focus on. We now 
formalize evolutionary search in analogy to global random search, but with different focus: 

In general we assume that the task is to find an element p in a search space P which is 
"superior" to all other points in P. Here, superiority is defined in terms of a quality measure 
for the search problem at hand (usually a fitness function). If P is too large to evaluate the 
quality of all p £ P, the strategy is to explore only a few points (pi,..,p\), evaluate their 
quality, and then try to extract information on where to perform further explorations. We 
capture this view on evolutionary search in an abstract formalism that is capable to unify the 
different specific approaches. Below, we exemplify each step of the scheme by embedding the 
Simple Genetic Algorithm (SGA) (Vose 1999) in the formalism. See also figure 1. 

Definition 1 (Evolutionary exploration) 
(i) The only information maintained for evolutionary search is a finite set of parameters 
G Q that uniquely define an exploration density M q ( t ) on P. Here, we call M 
the exploration model, actually a map from Q to the space A of densities over P. In 
general, the variety Mq = {M q \ q G Q} of representable densities is limited. 

(ii) Given some parameters q^\ exploration starts by choosing A samples sf^ A of the 
exploration density. We use brackets to indicate this sampling: 

= [M qit) ] x G P x . (1) 

Here and in the following, we disregard the possibility of elitists. Taking them into 
account would require to append selected points (p±, ..,p M ) of P to 



s 



(t) 



[M,( t )]A®(R,,P,)eP" +A . (2) 



(Hi) We require the existence of an evaluation E : P x — » A which maps the exploration 
sample (s±, .., s\) to a density over P with support {s±, .., s\}. This evaluation is applied 
to our exploration points: 

E s(t) = E([M q(t) ] x ) G A . (3) 

One should interpret E as "density of quality" rather than a probability density. 

(iv) Finally, there exists an update operator 

A : q® x E s(t) i ^ q {t+1 ^ . (4) 

In general, this operator is supposed to exploit the information in E s ( t ) . 

Example (The Simple Genetic Algorithm) 
(i) The SGA (without crossover) is a typical example of population-based modeling: 

qX*) = (p!,..,p M ) G P^ is a discrete population and M Pi specifies the offspring density 
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for each single individual. We call M Pi exploration kernels. The total exploration 
density reads 

M, = if;M Pi . (5) 
^ i=i 

We note that the key feature of population-based modeling is its capacity to represent 
multi- modal exploration densities. 

(ii) In the SGA, are new offsprings. The algorithms does not explicitly construct the 
complete exploration density M q ( t ) ; rather, the drawing of mutations for each individual 
resembles a sampling of the exploration kernels. 

(iii) For the SGA, evaluation is proportional to a given fitness function. 

(iv) The update rule of the SGA can be written as 

~ E([M q(t) ] n )] . (6) 



q (t+i) 



In words: From the parent population generate n offsprings [M ( t )] n , evaluate their 
fitness and select n new individuals by sampling their evaluation. 

One might assume that evolutionary algorithms mostly differ with respect to the update 
rule. However, we claim that the choice of the exploration model is crucial and that, given 
such a model, two generic update operators are canonical and widely in use: 

Definition 2 (Adopting and approaching updates) 

It is the adopting update to choose the update operator such that M (t+i) is a best possible 
approximation of E s ( t ) within the model class Mq (with respect to some chosen metric D 
on A): 

q( t+ V = argmin D(M q : E sW ) . (7) 

We will abbreviate this formula by using the simplified notation A = M~ l : 

q(t+i) _ M~ 1 (E s ( t )) . (8) 

Second, many algorithms realize not an adopting but rather an approaching update by 
slowly adapting q^K Here, the parameters must be continuous. The generic update rule 
reads 

= (1 - a) q® + a M~ 1 {E s{t) ) , (9) 



for some constant a G [0, 1]. 
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Figure 1: The general scheme of evolutionary search. 



Example (Update operator of the SGA) 

The update operator of the SGA is strongly related to the adopting update: The sampling 
[E s (t)] n of the evaluation density can be interpreted as "finding new parameters that 
approximate E s ( t ) in the population-based model". The quality of this approximation is 
reflected by the sampling error. 

Both of these canonical update operators are derived from M _1 . Thus, when we show 
that most existing evolutionary algorithms realize these operators, then we stress the impor- 
tance of the choice of exploration model. Note that any algorithm, when embedded in the 
upper formalism, is uniquely characterized by the choice of model M, the update operator A 
(eventually derived form M), the evaluation E (given at hand) and the sampling size A. 

It is, of course, possible to think of exceptions that cannot be embedded in this formalism. 
However, in appendix A we show how the formalism allows an embedding of - and a unified 
view on - very different state-of-the-art evolutionary algorithms. Indeed, those evolutionary 
algorithms mainly differ with respect to their exploration model. 

3 An indirect model of exploration 

After we stressed the importance of exploration modeling we concentrate on the specific case 
of modeling defined as follows: 

Definition 3 (Indirect exploration modeling) 

To model an exploration density over P, introduce a base space G = X n and a base 
density over G such that the variables x G X are independent with respect to . 
Then, introduce a GP-map h : G — > P that induces the exploration density M q = M^oh^ 1 
over P. Here, h~ 1 (p) C G is a subspace of G called neutral space of p G P; and 
is evaluated via integration. The class of allowed GP-maps and base densities 
limits this model M. The triplet (G,h, M G ) is also referred to as coding. 

In the following, in order to refer to their biological interpretations, we will also use the 
names phenotype space for the search space P, genotype space for the base space G, and 
phenotype-genotype mapping for h. 
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Also, we call the independent variables x € X genes and say we introduce genes on 
P when introducing such a GP-map and stressing the introducing of a representation via 
independent variables. This can be seen in analogy to the introduction of local coordinates 
on a manifold by a local map from a base space of (Cartesian) variables. There is, however, 
a crucial difference: The map h does not need to be one-to-one. If h is non-injective, there 
exist different genotypes gi that map to the same phenotype. Then there exist different 
neighborhoods U 9i that map to eventually different neighborhoods of the same phenotype. 
This change of neighborhood is of major interest. It allows a variability of exploration. The 
next section will address this important issue in detail. 

As an example for indirect modeling, note that the CMA (see appendix A) may be inter- 
preted as indirect modeling: it restricts the class of GP-maps to affine transformations; the 
translational part is encoded in the population's center of mass and the linear part is encoded 
in the covariance matrix; the base space is G = W 1 with normal density J\f(0, 1). 



3.1 Characters of indirect exploration: Pleiotropy, mutual information, lift 
of topology, neutrality 

The introduction of a GP-map leads to some straightforward definitions and notions. We use 
this section to briefly introduce some. 



Pleiotropy. In a biological context one may define pleiotropy as "the phenomenon of one 
gene being responsible for or affecting more than one phenotypic characteristic" . Our previous 
definitions allow to translate this notion into our formalism: Genes are independent (with 
respect to the base density) variables of G. One gene affecting more than one variable of P 
means that the change of one variable in G leads to the change of many variables in P. Thus 
pleiotropy means that the base density of independent variables is mapped on an exploration 
density of non-independent variables; pleiotropy may be measured by the correlatedness of 
variables of P with respect to the exploration density. We refer to this also as structure of 
the exploration density. In particular, we will measure pleiotropy as the mutual information 
contained in the exploration density. 



Population-based indirect modeling. Population-based modeling was defined in section 
2. We briefly clarify notations in the indirect modeling case: The parameters q € Q are a 
population (gi, ..,<7 M ) <G G^ on the base space and the exploration kernels M^. are such that 
the total exploration density reads: 



M„ 



Mq O hr 



It 



M 



ill 



o h 



-tl 



i 



(10) 



Lift of topology. For population-based modeling, the exploration kernels associate a den- 
sity of offsprings to each individual. Form a topological point of view, this defines a neigh- 
borhood (of most probable offsprings) for each individual, referred to as variational topology. 
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In the case of indirect modeling, the kernels Mg on the base space are lifted to kernels 
M g = o hr 1 on the search space. This means a lift of topology. 

Neutrality. The possibility of a non-injective GP-map h automatically leads to the defini- 
tion of neutrality. 2 In particular we define h~ 1 (p) as the neutral set of p G P. Further, the 
neutral degree of g G G is defined as the probability 

M g [h{g)]=Mf[h- 1 oh{g)}. (11) 

This reads: Take some individual g G G and let N = h~ l oh{g) be the neutral space "around" 
g. Now measure the probability M^[N] for landing in this neutral set when exploring from 

9- 

Such measures are thoroughly discussed by Schuster (1996) and Fontana and Schuster 
(1998) (see also section 4.1). However, in these publications, the variational topology rather 
than the probability is emphasized. For completeness we append: Let neighborhoods be 
defined in G and let B r (g) be the r-ball around g in G (those points linked to G by at 
least one chain of no more than r neighbors). We call the maximal connected component 
N g C h^ 1 oh(g) with g G N g neutral network of g G G and define: 

|/i _1 (/i(5()) n B x {g)\ neutral degree of g G G (12) 

3.2 Indirect exploration modeling in biology 

One may argue that algorithms as discussed in appendix A are hardly plausible in nature and 
thus without relevance for biology. What mechanisms should keep track of dependencies in 
nature, model distributions by storing a Bayesian network or a covariance matrix, and how 
should such knowledge be taken into account when creating new offsprings? 

Nevertheless, a biologist may in principle ask the same questions; we refer to Wagner and 
Altenberg (1996): How comes that some phenotypic characters are obviously correlated and 
others are not? How comes that a single gene in Drosophila can trigger the expression of 
many others and thereby the growth of a whole eye at different places on the body? The 
existence of pleiotropy is obvious; are its specific mechanisms an accident, an unavoidability, 
or the result of evolutionary optimization? What is optimized when adapting pleiotropy? 

The idea of Wagner and Altenberg is that in nature the genotype-phenotype mapping is 
adaptable and does adapt in such a way that pleiotropy between independent phenotypic 
characters is decreased (in order to allow for an unbiased, parallel search) while pleiotropy 
between correlated phenotypic characters may increase (in order to stabilize the optimal 
relative value of these characters). For example, pleiotropy between the existence of the 
eye's cornea and its photoreceptors is high because one alone won't contribute to selection 

2 More precisely, if also considering a fitness function / : P — » R, we denote non-injectiveness of h by 
phenotypic neutrality and non-injectiveness of / with fitness neutrality. In this paper, only phenotypic 
neutrality will be addressed to. 
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probability without the other. In contrast, pleiotropy between characters of the immune 
system is low in order to allow a fast, parallel optimization of different protection mechanisms 
which each separately contribute to selection probability. We mimic a discussion by picking 
some quotations of Wagner and Altenberg (1996) and adding a comment: 

Concerning evolvability 

"Evolvability is the genome's ability to produce adaptive variants when acted upon by the 

genetic system." [sec 5, par 2] 
In our words: Evolvability denotes the capability of a system to model a desired exploration 
distribution. 

"The thesis of this essay is, that the genotype-phenotype map is under genetic control and 
therefore evolvable." [sec 2, par 9] 

In the case of indirect modeling, the GP-map induces the exploration density on P. Conclud- 
ing, though, that evolvability requires a GP-map being "under genetic control" is questionable 
from our point of view. We reflect this circumstance in detail in the section 4. 

Concerning modularity 

"Modularity is one example of variational property." [sec 1, par 3] 
Modularity is a property of the exploration density. It denotes correlations, i.e. mutual 
information, between variables of P. We discussed such correlations in section 3.1 in the 
context of pleiotropy and structure of exploration. 

Concerning functional phenotypic complexes 

"The key feature is that, on average, further improvements in one part of the system must 
not compromise past achievements." [sec 5, par 10] 

"By modularity we mean a genotype-phenotype map in which there are few pleiotropic 
effects among characters serving different functions, with pleiotropic effects falling mainly 
among characters that are part of a single functional complex." [abstract] 

"Independent genetic representation of functionally distinct character complexes can be 
described as modularity of the genotype-phenotype map." [sec 6, par 1] 

"Evolution of complex adaptation requires a match between the functional relationships 
of the phenotypic characters and their genetic representation." [sec 6, par 6] 
In essence, the exploration density should have the character that some variables in P are 
mutually independent while others are dependent. Reflecting that adaptation can only oc- 
cur by extracting information from the evaluation density E s we claim that the notion of a 
"functional complex" or a "functionally distinct [phenotypic] character complex" may only 
be constituted via this evaluation density E s . More precisely, we define a functional phe- 
notypic complex as a set of variables of P that are highly dependent on each other (with 
high mutual information) but only weakly dependent on other phenotypic characters — all 
with respect to the evaluation density E s . The "required match" between these properties 
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of the exploration distribution and the evaluation distribution motivates the adopting or 
approaching update as introduced above. 

4 Neutrality as basis of self-adaptability of exploration 

So far, we stressed the importance of exploration modeling and focused on the special case 
of indirect modeling. We did not yet address the problem of how the exploration density can 
be adapted in the indirect modeling case. This section gives an answer by providing a strict 
definition of self-adaptability, which considers neutrality as a key feature. We will also review 
other interpretations of neutrality and argue in favor of our interpretation. 

Obviously, if exploration is described by means of fixed kernels around the positions of 
individuals, the exploration density varies when individuals move on. But this does not quite 
capture what we actually meant by requiring variable exploration. Rather it is intuitive to 
call for "adaptive codings". The review (Eiben, Hinterding, and Michalewicz 1999) (and also 
(Smith and Fogarty 1997)) summarizes and classifies such approaches. Their discussion is 
based on the assumption that the coding (G,h,M G ) depends on some parameters x G X 
called strategy parameters; we write (G x ,h x , M G ). They classify different approaches by 
distinguishing between different choices of X: 

(i) X are parameters altered by some deterministic rule (e.g., function in time) independent 
of any feedback from the evolutionary process, (deterministic) 

(ii) X are parameters depending on feedback from the evolutionary process, (adaptive) 

(iii) X is part of the genotype, (self-adaptive) 

Option (i) is of no interest here. It is very important to distinguish between (ii) and (iii). 
Option (ii) means to analyze the evolutionary process, namely the evaluation density and 
the exploration density itself, and deterministically deduce an adaptation. Good examples 
are the algorithms presented in appendix A. Option (iii) means that adaptation becomes a 
stochastic search itself — the search for a good exploration density is itself determined by 
just this exploration. 

However, as formulated above, following option (iii) is quite irritating since, after adding 
some strategy parameters X to G, the GP-map h still maps G — > P and it is formally incorrect 
to think of h as being parameterized by variables of G. One might want to escape this circle 
by splitting G into two parts, the strategy part X and the objective part G, G = G x X. 
Then, for some strategy parameters x <G X, one may define h : G x X — > P, (g, x) i— > h x (g) 
and call h x an adaptive GP-map. However, in general it is unclear which part of G is to 
be considered as strategy part and which as objective. Only in some cases, e.g. if simply 
adding control parameters that have no direct effect on the phenotype (neutral parameters!), 
this splitting seems to be straightforward. Also, one could argue that the mutation rate of 
the strategy part is kept very low. Formally and conceptually, though, these arguments are 
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Figure 2: Two different points 51,52 in G are mapped onto the same point in P. The elliptic 
ranges around the points illustrate the exploration kernels by suggesting the range of probable 
mutants. Thus, the two points 51,(72 belong to one neutral set but represent two different 
exploration strategies. 



unsatisfactory and thus we reject the definition of self-adaptability as given by option (iii). 
Instead, we circumvent such problems by defining: 



Definition 4 (Self-adaptive exploration) 

Given an indirect, population-based model M with GP-map h, exploration at x G P is 
defined self- adaptable if the exploration kernel M g = o h^ 1 varies for different g € 
h~ 1 (x) in the neutral set of x. The variety {M g \g € h~ 1 (x)} of different exploration 
kernels represents the scope of self- adaptability. 



What does this definition mean? Assume that one individual g £ G is drifting in a 
neutral set h^ 1 (x). Meanwhile, although its image h(g) is not changing at all, the probability 
distribution of offsprings in P (i.e. the exploration kernel M g associated to it) may change 
very well. This is how the definition captures the ability of exploration to adapt. See figure 
2 for an illustration. 

As a simple example we note that adding (neutral) mutation rate parameters aligns with 
this definition: Changing such strategy parameters actually is a neutral walk but varies the 
exploration kernels (e.g. by resizing them). Such and similar methods, may be understood 
as "local rescalings of neighborhood in P" ; distances (probabilities to reach neighbors within 
one generation) are rescaled. However, such methods do not aim at varying the variational 
topology within P: the probabilities for mutations into the neighborhood change, the neigh- 
borhood itself though is not varied. The generality of our definition also captures the latter 
kind of variability and it will be a major goal of this paper to exemplify it by introducing 
neutral variations that do vary the variational topology on P. 

In the following we will exclusively focus on self-adaptability of exploration as defined 
above. 
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Note: Focusing only on self-adaptability (neglecting option (ii)), we want to emphasize that 
we always consider the GP-map h to be fix, i.e. non-varying during evolution — and that 
this is not a restriction, not a loss of generality. If one would protest and claim that h should 
be variable by depending on genes in G, we veto by stating that the formalism requires to 
collect all genetic parameters in the space G, that by definition the GP-map h is the map 
which maps all G on P, and thus it is formally incorrect to speak of h as depending on genes 
in G. 

Of course, others may have another point of view and this does not diminish the profound 
meaning of, e.g., Wagner and Altenberg's statement that "the genotype-phenotype map is 
under genetic control and therefore evolvable." [sec 2, par 9] - though from our point of 
view a questionable formulation. 

4.1 Interpretations of neutrality 

It is intuitive to believe that every little detail in nature fulfills "some purpose"; evolution 
would abandon all useless mechanisms and redundancies. The existence of something like 
neutrality in nature offends this intuition: A typical example is the fact that different codons 
are transcribed into the same amino acid, suggesting that certain nucleotide substitutions 
have no effect whatever on the phenotype or its fitness — they are neutral. Such issues 
initiated many investigations, pioneered by Motoo Kimura's Neutral Theory (Kimura 1983). 
In a later paper (Kimura 1986), he defends his theory against the selectionists' criticism, who 
argued that neutral genes would be functionless, mere noise, and thus biologically implausible: 
"Sometimes, it is remarked that neutral alleles are by definition not relevant to adaptation, 
and therefore not biologically very important. I think that this is too short-sighted a 
view. Even if the so-called neutral alleles are selectively equivalent under a prevailing 
set of environmental conditions of a species, it is possible that some of them, when a 
new environmental condition is imposed, will become selected. Experiments suggesting 
this possibility have been reported by Dykhuizen & Hartl (1980) who called attention to 
the possibility that neutral alleles have a 'latent potential for selection'. I concur with 
them and believe that 'neutral mutations' can be the raw material for adaptive evolution." 
[Kimura (1986), page 345} 
The last section gave a clear statement of how neutrality can be understood as "raw material 
for adaptive evolution" . 

The interplay between neutrality and evolvability is a central topic also in other works. 
Fontana and Schuster (1998), when investigating neutrality inherent in protein folding, claim 
that neutrality enables discontinuous transitions in the protein's shape space (the space P): 
"[Transitions] can be triggered by a single point mutation only if the rest of the sequence [point 
in G] provides the appropriate context [neighborhood in G]; they are preceded by extended 
periods of neutral drift." [last but one paragraph] Their arguments focus on the connectivity 
of neutral sets which can be analyzed theoretically by percolation theory. We agree on these 
generic ideas. A precondition is however that neutral sets exist and, most important, that 
exploration varies along these neutral sets — as we captured in the above definition. 
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A very intriguing study of such phenomena in nature is the one by Stephens and Waelbroeck 
(1999). They empirically analyze the codon bias and its effect in HIV sequences. Codon bias 
means that, although there exist several codons that code for the same amino acid (which 
form a neutral set), HIV sequences exhibit a preference of which codon is used to code for a 
specific amino acid. More precisely, at some places of the sequence codons are preferred that 
are "in the center of this neutral set" (with high neutral degree) and at other places codons 
are biased to be "on the edge of this neutral set" (with low neutral degree). It is clear that 
these two cases induce different exploration densities; the prior case means low mutability 
whereas the latter means high mutability. They go even further by giving an explanation for 
these two (marginal) exploration strategies: Loci with low mutability (trivially) cause "more 
resistance to the potentially destructive effect of mutation" , whereas loci with high mutability 
might induce a "change in a neutralization epitope which has come to be recognized by the 
immune system." [introduction, par 4] 

Finally, several models of landscapes with tunable neutrality have been proposed to theo- 
retically investigate possible purposes of neutrality (Barnett 2000; Newman and Engelhardt 
1998; Reidys and Stadler 2001). 

In this paper we present a simple setup to demonstrate the dynamics in neutral networks 
in appendix B. Using Eigen's model we show a drift towards high neutral degree, i.e. towards 
representations of low mutability. This effect is important to understand the experiment we 
present in section 5.2. 

5 Paradigms of self-adaptive exploration 

The goal of this section is to exemplify the principles discussed above by simple and trans- 
parent (artificial) systems. In order to setup a running system we need to make some further 
decisions on 

(i) the problem (the space P), 

(ii) the GP-map (including the choice of G), 

(iii) the base density (population size, mutation rates on G, etc.), 

(iv) the evaluation (implementation of E), 

(v) the update rule A. 

In the following P will simply be strings over some alphabet A; the problem is to minimize 
the (Hamming) distance to a given target string. Concerning point (iv) and (v), we will use 
rank-based selection, i.e. we evaluate proportionally to the rank of each individual and update 
the population by sampling this evaluation density. Point (ii) and (iii) need more thorough 
considerations: 

A recursive, grammar-type GP-map. We decide to implement the GP-map as a recur- 
sive mapping. More precisely, h is representable as a composition of a single GP-generator 
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h : G 



G 



h = h o , . oh : G->PcG : h o .. o h (g) ■ 



(13) 



m(-) times m{g) times 



This inevitably requires a choice of G such that P C G. The recursion depth m may depend 
on the point g <E G. Generically, we require that each GP-generator affects (or entangles) 
only a few variables within G. The motivation is as follows: Structuredness of exploration, 
as discussed in section 3.1 and 3.2, means mutual information between variables that belong 
to the same phenotypic character and less mutual information else. We want the generator 
to represent elementary correlating effects (e.g. of interaction), i.e. to constitute elementary 
modules. For example, an elementary correlating effect is that one character depends also on 
another and a respective generator would introduce such mutual information by mapping one 
independent variable onto one which depends on other variables. An iV-fT-reaction network 
is a basic example: the generator (the time step transformation) entangles K variables to a 
new one. 

Our examples will use a grammar-type recursive mapping. The space G is organized as 



which means that g encompasses one structure go € P (called axiom) and r tuples gi £ Ax P 
(called rules). The GP-map h applies to g £ G by applying all rules to go; the symbols I G A 
in each rule (actually the lhs label of a grammar rule) specify how to apply the rule. (The GP- 
generator is the single application of one rule to the axiom.) In our examples, the recursion 
depth m is always fixed (so we need no terminal symbols or other complicated mechanisms.) 

Such grammar-type encodings have been investigated in many other respects, e.g. by 
Prusinkiewicz and Hanan (1989) and Prusinkiewicz and Lindenmayer (1990) discussing L- 
systems as natural representation of highly regular, plant-like structures; by Kitano (1990), 
Gruau (1995), Lucas (1995), and Sendhoff and Kreutz (1998) using grammar-encodings as 
representation of neural networks. However, these approaches are not based and motivated 
on a discussion of self-adaptive exploration. Thus, although in most cases the existence of 
neutral sets (equivalent representations) in grammar encodings is obvious, the importance to 
introduce (neutral) variations that explore these existing neutral sets and thereby explore dif- 
ferent explorations strategies was not recognized and stressed. The next paragraph concerns 
the introduction of such variations. 

Neutral variations in grammar-type encodings. We turn to the choice of base density, 
i.e. variability on G. We assume that there exist canonical mutations on P, namely flip (with 
probability a per symbol), insertion, duplication and deletion (with probability 7 per string). 
Since G is composed of structures of P these mutations induce standard mutations on G. 

However, to take all the considerations of section 4 into account, we additionally introduce 
neutral variations on G. These variations are supposed to allow for self-adaptability as defined 
above, i.e. they should allow neutral variations that vary exploration. In our examples we 



G = Px [A x P] r , 



(14) 
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realize such variations by rule substitutions and creations. Specifically we introduce five kinds 
of variations of g € G, which are likely to be neutral but need not always to be: 

(i) Pick one rule and one structure G P (any rhs or the axiom) within g; then apply the 
rule once to the structure. 

(ii) Pick one rule and one structure; check if the rhs of the rule is part of the structure; if 
so, replace this part by applying the rule inversely. 

(hi) Pick a structure and create a new rule by extracting a part out of the structure and 
replacing it by a symbol. 

(iv) Delete a rule if it is never applied during recursion. 
All of these variations will occur with probability (3 per rule (per structure in case (hi)). 

5.1 Basic paradigm 

Let P be strings of the alphabet {0,1, x}. Consider the following two points a,b € G to 
represent the same point 0101 in P: 

ao = Olx , oi = (i w 01) , 
bo = xx , bi = (x <—> 01) . 

If we assume that the rhs of ai and &i have considerable mutability, the exploration kernels of 
a and b are quite different: Probable (phenotypic) mutants of a are 0111, 0100, 0110, whereas 
b is likely to produce mutants like 1111,0000, 1010. The difference of these two exploration 
densities is of topological nature. 

In order to enable a transition between such different strategies, the exploration of the 
corresponding neutral set must be possible. In the upper example it is easy to dehne a 
neutral mutation from a to b: The rule itself is to apply to the axiom. The inverse mutation 
requires an application of the rule from right to left, i.e., see if the rhs fits somewhere and 
substitute by the lhs. Our system incorporates these variations. 

5.2 Two experiments: Variability of exploration and neutral drift 

Let P be the strings over the alphabet {a,b,c,d,e,f,g,h}. The function / is the Hamming 
distance to the fixed target string abcdeabcdeabcdeabcdeabcde, i.e. 5 times abode. To 
demonstrate a neutral drift we consider only one individual and initialize it with an axiom 
equal to the target and no rule. Selection is (1+1), i.e. at each time step one offspring is 
produced and selected if equally good or discarded if worse. As a result of neutral variations, 
the number of rules and the probability for regular mutations in the exploration density vary 
in correlation. This kind of variability of exploration is of topological nature. The point is, 
we gave an example where the topological characters of the exploration density vary over a 
connected neutral set. See figure 3. 

We enhance this example by considering a population of 100 individuals and non-elitist, 
rank-based selection. All individuals are initialized as described above. The population drifts 
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Figure 3: A single individual is tracked when drifting on a neutral set spanned by neutral 
substitutions in its grammar-encoding. Its exploration density is analyzed by taking 10 000 
samples at each time step. 'Modular exploration' counts the probability for mutations that 
occur equally at same positions in other blocks. These are blocks of 5 symbols as given by the 
target string: 5 x abcde. 'Rule usage' counts how often rules are applied during recursion. 
[Population size \i = 1; mutation probabilities a = 0.001, ft = 0.1; recursion depth m = 10; 
scaling of y-axes is only relative.] 

towards representations (points in the neutral set of the target string) with high neutrality. 
This effect is explained in detail in appendix B. Here, a high neutral degree coincides with 
representations of short description length (the sum of lengths of the axiom and rhs of rules). 
In order to achieve such compact representations, more rules are extracted and included in 
the representation. A visualization of the exploration density via mutual information maps 
exhibits its clear structure that corresponds to the target string's structure. One may interpret 
that the system has "learned the problem's structure" . See figure 4. 
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Figure 4: Upper plot: A population is tracked when drifting on a neutral set. Selection is non- 
elitist and rank-proportional and thus pushes the population towards higher neutral degree. 
This is achieved by finding representations of shorter description length of the modular target 
string (5 x abode). (Description length equals the sum of the lengths of axiom and all rhs). 
This in turn is achieved by making use of rules. Lower plots: The mutual information between 
the 25 variables in the (phenotypic) exploration density is displayed as a matrix. The three 
plots correspond to times 50, 500, and 2000. The regular, 5-modular structure of exploration 
is clearly visible. [Population size \i = 100; mutation probabilities a = [3 = 0.02; target: 5 x 
ABODE; recursion depth m = 3; scaling of y-axes is exact for neutrality, only relative for the 
rest; scaling of gray-shading is only relative.] 
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6 Conclusions 

Major parts of this paper are concerned to develop an integrated language for evolutionary 
search based on the formalism of stochastic search and emphasizing the exploration density 
and its parameterization. The benefit is a unified view on different specific approaches, their 
commonness and differences. For example, at first sight it is hard to see what a CMA evolu- 
tionary strategy has in common with the codon bias in HIV sequences. The answer is: both 
of them are concerned to model the variability of future offsprings, the exploration density; 
both of them by using a kind of genotype-phenotype mapping (an affine transformation in the 
first case). Also notions such as pleiotropy and functional phenotypic complex can properly 
be defined on the basis of this language. This allows to make contact between biological and 
computational research. The functional meaning of a genotype-phenotype mapping is illumi- 
nated by interpreting it as a lift of an exploration density and topology on the search space. 
We showed that a non-injective genotype-phenotype mapping can lift different exploration 
strategies, different topologies to the same phenotype. This is the core of how we define 
self-adaptability of exploration. The definition overcomes the formal weakness of previous 
definitions and is as general as the language it is based on. The definition opens a completely 
new view on the meaning of neutrality. 

In the experimental part of this paper we presented elementary examples of these concepts. 
We illustrated the structure of exploration by a gray-shade map of the mutual information 
within the exploration density, a gray-shade map of pleiotropy. We exemplified its variability 
during neutral drifts. And we demonstrated successful self-adaptability of exploration where 
in the end the structure of exploration perfectly matches the structure of the problem. 

We will now discuss some further implications of the new view we have developed in this 
paper: 

(i) On modularity, structuredness, and evolvability. Given a system that functions 
well, how should one define what a module or a functional complex is? One only observes that 
all parts together work well as a whole. A common idea is that modules are characterized by 
high interactivity within them. By high interactivity we mean that there are high correlations 
between units during the time of functioning. These are completely different kinds of corre- 
lations than correlations between units in the evolutionary variability. It is though possible 
to draw a link: Having units that are highly interacting during functioning, the fitness might 
strongly depend on their teamwork. If this is the case, also the evaluation density should 
incorporate high correlations between the units (i.e. the units form a functional phenotypic 
complex). Now, if the exploration density should approximate the evaluation density, we also 
find these correlations in the evolutionary variability. 

Thus, when talking about modules, one should be aware of the interrelations between these 
three levels of correlations: (1) during functioning, (2) in the evaluation density, (3) in the 
exploration density. Our definition of a functional phenotypic complex refers to the 2nd level 
- the evaluation density. Our hypothesis is that the advantage of structured systems (and 



6 CONCLUSIONS 



19 



thus the selective pressure towards structure) stems from the 3rd level: 

Systems are structured, not because this is the only possible way of functioning, but because 
it is advantageous for variability. The advantage of structured variability is its capability to 
explore by approximating the "problem's structure", the structure of the evaluation density. 

This capability should be called evolvability. 

For example, parts of a system that contribute separately to fitness should be varied and 
optimized in parallel without potentially disturbing correlations; whereas parts of a system 
that only contribute to fitness when they are tuned on each other should be varied in corre- 
lation in order to preserve this tuning. 

(ii) On redundancy and neutrality Neutrality is often thought of as redundancy From 
our point of view, this is very misleading. As we pointed out in the context of self-adaptability, 
although all the genotypes in a neutral set encode the same phenotype, they may have very 
different exploration kernels. Thus, such genotypes may carry different information. One 
cannot speak of redundancy if different and relevant information is encoded. If, however, 
genotypes in a neutral set have identical exploration kernels (in the genotype space), then 
they are indeed redundant. Redundancy is necessarily neutral, but neutrality is not necessarily 
redundant. 

(iii) On compact representations Assume we use a Bayesian network to model the struc- 
ture of exploration. Then we will explicitly encode the correlations between all phenotypic 
variables. In contrast, our second example shows how compact representations correspond to 
highly structured exploration and can be found by using recursive codings. The idea is that 
each recursion introduces correlations in the variables. The neutral drift towards high neutral 
degree (see appendix B) induces a selective pressure towards short representations. 

(iv) On grammar-type encodings In grammar-type encodings, some single genotypic 
variables (genes) might effectively represent whole groups of phenotypic variables. Thus, 
when we model dependencies between variables, we can also model dependencies between 
whole groups of phenotypic variables and not only between single phenotypic variables as in 
the direct modeling ansatz. This allows to introduce deep hierarchical dependencies in the 
exploration density. 

Most existing approaches to grammar encoding are motivated by the fact that grammars 
can represent regular structures with short description length. Instead, we claim that the most 
interesting point about grammars is their capability to introduce structure in the variability, 
as demonstrated in our examples. In order to explore these capabilities in a self-adaptive 
manner, the inclusion of neutral variations in recursive or grammar-type encodings is of 
crucial importance. This point seems neglected in the literature. 

We rigorously support Kimura's "belief that 'neutral mutations' can be the raw material 
for adaptive evolution" (Kimura 1986). 
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A The exploration model of different state-of-the-art evolu- 
tionary algorithms 

To stress the importance of the concept of exploration modeling we want to show that the 
main difference between specific evolutionary algorithms is their ansatz to model exploration. 
In order to do so, we embed specific algorithms in our formalism. In particular we chose 
to analyze the CMA algorithm and three recent approaches which belong to the class of 
"probabilistic model-building genetic algorithms" (PMBGAs), see (Pelikan, Goldberg, and 
Lobo 1999). All of these realize adaptive (but not self-adaptive) exploration. 

Covariance Matrix Adaptation (CMA), (Hansen and Ostermaier 2000). The search 
space is continuous, P = W 1 . The CMA algorithm maintains as parameters q only one 
(center of mass) point p 6 P, the symmetric covariance matrix C, and some adaption rate 
parameters. The exploration density M q is given by a linear transformation (via C) of a 
Gaussian distribution around p. In practise, the algorithm generates A normally distributed 
mutation vectors € M. n , transforms all of these vectors by multiplying the matrix C, and 
adds these vectors to the center of mass p in order to generate the new A samples. After 
evaluation of the samples it is updated as follows: p is moved to the center of mass of the 
selected samples and C is adapted as 

c (t+i) = (i _ c ) c {t) + cz®z . (15) 

Here, c is some adaption constant and z is the average 3 of the selected mutations vectors. 
(Hansen and Ostermaier (2000), Eq. 15, write z(z) T instead of z®z). The point is that z®z 
is the unique symmetric matrix which maps the equally distributed vector y = .., ^) to z. 
Thus, the update rule for C corresponds to our generic approaching update whereas p adopts 
the new center of mass. 

Dependency tree modeling, (Baluja and Davies 1997). Here, the search space is discrete, 
P = X n . In their algorithm, the parameter q that describes the next exploration density is 
a dependency tree. Thus, the model is restricted to encode only pair-wise dependencies 
between variables. At each time step, A samples are generated from this exploration density; 
the samples are evaluated and the best fi of them are selected. A probability density A 
of previously selected points is adapted by including those newly selected ones (generically 
A <— (1 — a) A+a [E (t)\u)- Then the dependency tree is updated by minimizing the Kullback- 
Leibler divergence between A and M q . The tree's update is an adopting since it approximates 
A, whereas A itself is updated according to an approaching update. 

Factorized Distribution Algorithm (FDA), (Muhlenbein, Mahnig, and Rodriguez 1999) 
Again, P = X n is discrete. The parameters q describe the conditional dependencies in pairs, 
3 More exactly an weighted average trace over time, see (Hansen and Ostermaier 2000) Eq. 14. 
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triples, quadruples, etc. of variables. (To be exact, the algorithm comprises also some elitists.) 
The model is quite general but it relies on pre-fixed knowledge on which pairs, triples, etc. 
exactly are to be parameterized. At each time step, the dependencies within the distribu- 
tion of evaluated and selected points are calculated and assigned to q. Therefore, this is an 
adopting update. 

Bayesian Optimization Algorithm (BOA), (Pelikan, Goldberg, and Cantu-Paz 2000). 
P = X n is discrete. Here, q is a general Bayesian dependency network that explicitly encodes 
the exploration density. Thus, the model is not limited in representing arbitrary orders 
of correlation and it is flexible in which variables are dependent by inserting and deleting 
connections in the network. After selection, the network is recalculated in order to minimize 
(e.g. with a greedy algorithm) the distance (e.g. with respect to the Bayesian Dirichlet Metric) 
between M q and the distribution of selected. This is, except for elitists, also an adopting 
update. 



B Illustrating neutral dynamics 

As an illustration of neutral dynamics we present a simple example. We assume that the 
search space P is discrete and rather small, \P\ = A. A denotes the space of densities over P, 
which actually is a simplex. Parameter q £ Q is such a density, Q = A, and the exploration 
density M q is a mutation rq £ A of this density. This example omits sampling and thus 
evaluation E : A — ► A directly applies to M q = rq. The update rule is the adopting: 

q M=E Tq W, qf +1) = i2E ljTjkq <£\ (16) 

j,k=i 

whereby we actually formulated Eigen's model (see e.g. (Eigen, McCaskill, and Schuster 1989)) 
in our notation. Finding the eigenvectors otEr means finding a stationary population density. 
Their eigenvalues describe their growth factor and the eigenvector with highest eigenvalue will 
describe the final attractor — the quasi-species. In the presence of a neutral set N (here a set 
of indices) we assume that only individuals on this neutral set are evaluated positively and 
without co-evolutionary (interacting) effects, i.e., E is diagonal and 

£:A^A, m^E i3Pj = E iiPl = [J p)pi ]H . (17) 

We investigate two options for the evaluation factor ei(p). The first and straightforward 
option is that all positions on the neutral set are evaluated equally, then 

e}(p) = ^— (18) 

is just the appropriate normalization factor. This option is realized e.g. for fitness-proportion- 
al evaluation (when fitness on P\ N vanishes) but also for fair ranking. For the second option 
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Figure 5: The search space P is represented as a 10x10 board. The neutral set is embedded 
as depicted on the left. The exploration matrix r corresponds to a mutation rate of 0.1 in 
each of the four directions (up,down, right, left). In the first experiment, when evaluation is 
straightforward, e.g. fitness-proportional, it is impressive to see how strong the attraction 
towards the crossing with neutral degree 1 (with four neutral neighbors) is. In the second 
experiment, where evaluation enforces a kind of local conservation of population density, the 
population is equally distributed on the neutral set, but exploration on places with high 
neutral degree is proportionally higher because they have more neighbors from which they 
"receive" offsprings. 



we enforce such positions on the neutral set with low neutral degree — inverse-proportionally 
to the neutral degree: 

l/di 



ef (P) = 



Tik ■ 



(19) 



The quantity di is the probability for an offspring of individual i to be an element of N. Thus, 
this option increases the evaluation of i such that the probability to provide an offspring in 
N becomes equal for all i £ N. This can be compared to a local conservation of population 
density: Effectively, each parent in N will with equal probability contribute a viable offspring 
to the next generation. Such a type of selection can be realized by local selection mechanisms: 
From each parent produce many offsprings, let only the best of these offsprings compete 
with others. As a result, the quasi-species is simply constant on N and vanishes elsewhere, 
q i( z N = 1/\N\, q^ N = 0: 

di 

\N\ ' 

1 



Pi = (t q)i = T n 1j 

lid; 



(20) 
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The mutated density pi is proportional to di (which, for individuals out of N, does not denote 
the neutral degree but rather the probability for offsprings in N). Diversity is much higher 
than for the first type of evaluation. See figure 5. 

The first experiment is an explanation for the dynamics we observe in section 5.2. We 
included the second experiment because it realizes what one might intuitively have expected: 
on a neutral set the population is distributed equally and with high diversity. We showed 
what kind of evaluation one has to choose to fulfill this expectation. 

The findings are conform with Nimwegen's (1999) little examples of random or selective 
walks on a neutral set: A blind ant would try one (random) neighboring genotype and walk 
to it if it has same fitness or stay otherwise. A myopic ant would find all neighbors with same 
fitness and walk to one (random) of those. He finds that, in temporal average, the blind ant 
stays equal times at each genotype of the neutral set whereas the myopic ant stays longer at 
centers of the neutral set (i.e. oc the neutral degree). The myopic ant, since it always finds a 
neutral neighbor, corresponds to our second example. 
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